Problema
A motivação original de nosso projeto é o entendimento dos padrões
climáticos e suas alterações no clima de uma das cidades mais relevantes
num cenário global: Londres. Para isso decidimos fazer um projeto do 1º
formato (utilizando um dataset público), onde a base escolhida
(london_weather.csv, obtida no website Kaggle
através desse
link) pode ser interpretada como uma série temporal, a medida que
mede as variações do tempo (entre 1979 e 2021) de variáveis fixadas em
um único local.
O objetivo do projeto é o uso de um algoritmo de machine learning
para que o mesmo consiga prever a categoria de duas das variáveis da
database (precipitação e neve) para uma nova informação adicionada ao
modelo. As categorias seriam então “chove” ou “não chove” e “neva” ou
“não neva”. A ideia será fazer 3 métodos distintos (Regressão Logística,
KNN e Árvore de Decisão), compará-los através de suas métricas
relacionadas à acurácia e ver qual tem a melhor
performance/predição.
Dados
A base de dados escolhida foi criada a partir da união de medições
oriundas de pedidos de atributos individuais do clima providos pela
European Climate Assessment (ECA). As medidas desta base de
dados em particular foram gravadas pela estação climática nas redondezas
do Aeroporto Heathrow em Londres, Reino Unido. O tamanho
original da base de dados escolhida, assim como uma
lista dos atributos e suas descrições, está descrito abaixo:
london_weather.csv - 15341 observações x 10
atributos:
date - data em que ocorreu a medição -
(int)
cloud_cover - medição da nebulosidade em oktas -
(float)
sunshine - medição da luz solar em horas (hrs) -
(float)
global_radiation - irradiação medida Watt por metro
quadrado (W/m2) - (float)
max_temp - temperatura máxima registrada em graus
Celsius (°C) - (float)
mean_temp - temperatura média registrada em graus
Celsius (°C) - (float)
min_temp - temperatura mínima registrada em graus
Celsius (°C) - (float)
precipitation - precipitação medida em milímetros
(mm) - (float)
pressure - pressão medida em Pascals (Pa) -
(float)
snow_depth - profundidade da neve medida em
centímetros (cm) - (float)
Tratamento dos
dados
#Entire Code of ETL data
Dataframe usado
Mostrando as 10000 primeiras linhas do dataframe já tratado.
Modelos
Primeiramente devemos encontrar uma solução baseline para o problema
proposto. Usando a intuição e o senso comum imagina-se que, em dias
muito frios, haverá uma maior probabilidade de nevar, enquanto que dias
mais quentes terão maior incidência de chuva e sem neve. Tomemos então
isso como nossa solução baseline que será comparada aos modelos
categorizados citados acima (Regressão Logística, Árvore de Decisão e
KNN).
Regressão
Logística
Árvore de
Decisão
KNN ( K-Nearest
Neighbors )
Comparando modelos
Conclusão
Todos códigos referentes ao projeto podem ser encontrados nesse repositório.
LS0tDQp0aXRsZTogJzxwIGNsYXNzPSJoMSIgc3R5bGU9ImZvbnQtd2VpZ2h0OjYwMCI+UHJvamV0byBGaW5hbDwvcD4NCiAgICAgIDxwIGNsYXNzPSJoMiI+TUFDMDQ2MCAtIEludHJvZHXDp8OjbyBhbyBhcHJlbmRpemFkbyBkZSBtw6FxdWluYTwvcD4NCiAgICAgIDxwIGNsYXNzPSJoNCI+UHJvZsKqIE5pbmEgSGlyYXRhPC9wPg0KICAgICAgPGJyPg0KICAgICAgPGRpdiBzdHlsZT0iZm9udC1zaXplOnNtYWxsOyBmb250LXdlaWdodDogMjAwO3RleHQtYWxpZ246cmlnaHQiPg0KICAgICAgQW5kcsOpIEtlbmppIEZsb3JlbnRpbm8gWWFtYW1vdG8gLSAxMTgwOTYyMSA8YnI+DQogICAgICBCcnVubyBHcm9wZXIgTW9yYmluIC0gMTE4MDk4NzUgPGJyPg0KICAgICAgTHVpZ2kgUGF2YXJpbmkgZGUgTGltYSAtIDExODQ0NjQyIDxicj4NCiAgICAgIFZpdG9yIEdhcmNpYSBDb21pc3NvbGkgLSAxMTgxMDQxMSA8YnI+DQogICAgICA8L2Rpdj4nDQpvdXRwdXQ6IA0KICBodG1sX25vdGVib29rOg0KICAgIG51bWJlcl9zZWN0aW9uczogdHJ1ZQ0KICAgIGNzczogInNldHRpbmdzL3N0eWxlLmNzcyINCiAgICB0b2M6IHRydWUNCiAgICB0b2NfZGVwdGg6IDMNCiAgICB0b2NfZmxvYXQ6IA0KICAgICAgY29sbGFwc2VkOiBmYWxzZQ0KICAgICAgc21vb3RoX3Njcm9sbDogZmFsc2UNCiAgICBkZl9wcmludDogcGFnZWQNCi0tLQ0KPGhyPg0KYGBge3IgaW5jbHVkZT1GQUxTRX0NCnNvdXJjZSgnc2V0dGluZ3Mvc2V0dXAuUicpDQpgYGANCg0KYGBge3IgZWNobz1GQUxTRSwgbWVzc2FnZT1GQUxTRSwgcmVzdWx0cz0naGlkZScsIHdhcm5pbmc9RkFMU0V9DQpsaWJyYXJ5KHJldGljdWxhdGUpDQpgYGANCg0KYGBge3IgZWNobz1GLCBtZXNzYWdlPUYsd2FybmluZz1GLHJlc3VsdHM9J2hpZGUnfQ0Kc291cmNlKCdzZXR0aW5ncy9wbG90c19zdHlsZS5SJykNCnNvdXJjZV9weXRob24oJ3NldHRpbmdzL3Bsb3RzX3N0eWxlLnB5JykNCmBgYA0KDQo8IS0tICUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlIC0tPg0KIyBQcm9ibGVtYQ0KDQpBIG1vdGl2YcOnw6NvIG9yaWdpbmFsIGRlIG5vc3NvIHByb2pldG8gw6kgbyBlbnRlbmRpbWVudG8gZG9zIHBhZHLDtWVzIGNsaW3DoXRpY29zIGUgc3VhcyBhbHRlcmHDp8O1ZXMgbm8gY2xpbWEgZGUgdW1hIGRhcyBjaWRhZGVzIG1haXMgcmVsZXZhbnRlcyBudW0gY2Vuw6FyaW8gZ2xvYmFsOiBMb25kcmVzLiBQYXJhIGlzc28gZGVjaWRpbW9zIGZhemVyIHVtIHByb2pldG8gZG8gMcK6IGZvcm1hdG8gKHV0aWxpemFuZG8gdW0gZGF0YXNldCBww7pibGljbyksIG9uZGUgYSBiYXNlIGVzY29saGlkYSAoYGxvbmRvbl93ZWF0aGVyLmNzdmAsIG9idGlkYSBubyB3ZWJzaXRlIF9LYWdnbGVfIGF0cmF2w6lzIFtkZXNzZSBsaW5rXShodHRwczovL3d3dy5rYWdnbGUuY29tL2RhdGFzZXRzL2VtbWFudWVsZndlcnIvbG9uZG9uLXdlYXRoZXItZGF0YSkpIHBvZGUgc2VyIGludGVycHJldGFkYSBjb21vIHVtYSBzw6lyaWUgdGVtcG9yYWwsIGEgbWVkaWRhIHF1ZSBtZWRlIGFzIHZhcmlhw6fDtWVzIGRvIHRlbXBvIChlbnRyZSAxOTc5IGUgMjAyMSkgZGUgdmFyacOhdmVpcyBmaXhhZGFzIGVtIHVtIMO6bmljbyBsb2NhbC4NCg0KTyBvYmpldGl2byBkbyBwcm9qZXRvIMOpIG8gdXNvIGRlIHVtIGFsZ29yaXRtbyBkZSBtYWNoaW5lIGxlYXJuaW5nIHBhcmEgcXVlIG8gbWVzbW8gY29uc2lnYSBwcmV2ZXIgYSBjYXRlZ29yaWEgZGUgZHVhcyBkYXMgdmFyacOhdmVpcyBkYSBkYXRhYmFzZSAocHJlY2lwaXRhw6fDo28gZSBuZXZlKSBwYXJhIHVtYSBub3ZhIGluZm9ybWHDp8OjbyBhZGljaW9uYWRhIGFvIG1vZGVsby4gQXMgY2F0ZWdvcmlhcyBzZXJpYW0gZW50w6NvIOKAnGNob3Zl4oCdIG91IOKAnG7Do28gY2hvdmXigJ0gZSDigJxuZXZh4oCdIG91IOKAnG7Do28gbmV2YeKAnS4NCkEgaWRlaWEgc2Vyw6EgZmF6ZXIgMyBtw6l0b2RvcyBkaXN0aW50b3MgKFJlZ3Jlc3PDo28gTG9nw61zdGljYSwgS05OIGUgw4Fydm9yZSBkZSBEZWNpc8OjbyksIGNvbXBhcsOhLWxvcyBhdHJhdsOpcyBkZSBzdWFzIG3DqXRyaWNhcyByZWxhY2lvbmFkYXMgw6AgYWN1csOhY2lhIGUgdmVyIHF1YWwgdGVtIGEgbWVsaG9yIHBlcmZvcm1hbmNlL3ByZWRpw6fDo28uDQoNCjwhLS0gJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUgLS0+DQojIERhZG9zDQoNCkEgYmFzZSBkZSBkYWRvcyBlc2NvbGhpZGEgZm9pIGNyaWFkYSBhIHBhcnRpciBkYSB1bmnDo28gZGUgbWVkacOnw7VlcyBvcml1bmRhcyBkZSBwZWRpZG9zIGRlIGF0cmlidXRvcyBpbmRpdmlkdWFpcyBkbyBjbGltYSBwcm92aWRvcyBwZWxhIF9FdXJvcGVhbiBDbGltYXRlIEFzc2Vzc21lbnRfIChFQ0EpLiBBcyBtZWRpZGFzIGRlc3RhIGJhc2UgZGUgZGFkb3MgZW0gcGFydGljdWxhciBmb3JhbSBncmF2YWRhcyBwZWxhIGVzdGHDp8OjbyBjbGltw6F0aWNhIG5hcyByZWRvbmRlemFzIGRvIEFlcm9wb3J0byBIZWF0aHJvdyBlbSBMb25kcmVzLCBSZWlubyBVbmlkby4gTyB0YW1hbmhvICoqb3JpZ2luYWwqKiBkYSBiYXNlIGRlIGRhZG9zIGVzY29saGlkYSwgYXNzaW0gY29tbyB1bWEgbGlzdGEgZG9zIGF0cmlidXRvcyBlIHN1YXMgZGVzY3Jpw6fDtWVzLCBlc3TDoSBkZXNjcml0byBhYmFpeG86DQoNCmBsb25kb25fd2VhdGhlci5jc3ZgIC0gMTUzNDEgb2JzZXJ2YcOnw7VlcyB4IDEwIGF0cmlidXRvczoNCg0KLSBgZGF0ZWAgLSBkYXRhIGVtIHF1ZSBvY29ycmV1IGEgbWVkacOnw6NvIC0gKCoqaW50KiopDQoNCi0gYGNsb3VkX2NvdmVyYCAtIG1lZGnDp8OjbyBkYSBuZWJ1bG9zaWRhZGUgZW0gb2t0YXMgLSAoKipmbG9hdCoqKQ0KDQotIGBzdW5zaGluZWAgLSBtZWRpw6fDo28gZGEgbHV6IHNvbGFyIGVtIGhvcmFzIChocnMpIC0gKCoqZmxvYXQqKikNCg0KLSBgZ2xvYmFsX3JhZGlhdGlvbmAgLSBpcnJhZGlhw6fDo28gbWVkaWRhIFdhdHQgcG9yIG1ldHJvIHF1YWRyYWRvIChXL20yKSAtICgqKmZsb2F0KiopDQoNCi0gYG1heF90ZW1wYCAtIHRlbXBlcmF0dXJhIG3DoXhpbWEgcmVnaXN0cmFkYSBlbSBncmF1cyBDZWxzaXVzICjCsEMpIC0gKCoqZmxvYXQqKikNCg0KLSBgbWVhbl90ZW1wYCAtIHRlbXBlcmF0dXJhIG3DqWRpYSByZWdpc3RyYWRhIGVtIGdyYXVzIENlbHNpdXMgKMKwQykgLSAoKipmbG9hdCoqKQ0KDQotIGBtaW5fdGVtcGAgLSB0ZW1wZXJhdHVyYSBtw61uaW1hIHJlZ2lzdHJhZGEgZW0gZ3JhdXMgQ2Vsc2l1cyAowrBDKSAtICgqKmZsb2F0KiopDQoNCi0gYHByZWNpcGl0YXRpb25gIC0gcHJlY2lwaXRhw6fDo28gbWVkaWRhIGVtIG1pbMOtbWV0cm9zIChtbSkgLSAoKipmbG9hdCoqKQ0KDQotIGBwcmVzc3VyZWAgLSBwcmVzc8OjbyBtZWRpZGEgZW0gUGFzY2FscyAoUGEpIC0gKCoqZmxvYXQqKikNCg0KLSBgc25vd19kZXB0aGAgLSBwcm9mdW5kaWRhZGUgZGEgbmV2ZSBtZWRpZGEgZW0gY2VudMOtbWV0cm9zIChjbSkgLSAoKipmbG9hdCoqKQ0KDQojIyBUcmF0YW1lbnRvIGRvcyBkYWRvcw0KDQpgYGB7cHl0aG9uIGNvZGU9cmVhZExpbmVzKCJzY3JpcHRzL3NjcmlwdF9lbHQucHkiKSwgaW5jbHVkZT1UfQ0KI0VudGlyZSBDb2RlIG9mIEVUTCBkYXRhDQpgYGANCg0KIyMgRGF0YWZyYW1lIHVzYWRvDQoNCk1vc3RyYW5kbyBhcyAxMDAwMCBwcmltZWlyYXMgbGluaGFzIGRvIGRhdGFmcmFtZSBqw6EgdHJhdGFkby4NCmBgYHtyIGVjaG89Rn0NCiMgRFQ6OmRhdGF0YWJsZShweSRkYXRhKQ0KcHkkZGF0YQ0KYGBgDQo8IS0tICUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlIC0tPg0KIyBNb2RlbG9zDQoNClByaW1laXJhbWVudGUgZGV2ZW1vcyBlbmNvbnRyYXIgdW1hIHNvbHXDp8OjbyBiYXNlbGluZSBwYXJhIG8gcHJvYmxlbWEgcHJvcG9zdG8uIFVzYW5kbyBhIGludHVpw6fDo28gZSBvIHNlbnNvIGNvbXVtIGltYWdpbmEtc2UgcXVlLCBlbSBkaWFzIG11aXRvIGZyaW9zLCBoYXZlcsOhIHVtYSBtYWlvciBwcm9iYWJpbGlkYWRlIGRlIG5ldmFyLCBlbnF1YW50byBxdWUgZGlhcyBtYWlzIHF1ZW50ZXMgdGVyw6NvIG1haW9yIGluY2lkw6puY2lhIGRlIGNodXZhIGUgc2VtIG5ldmUuIFRvbWVtb3MgZW50w6NvIGlzc28gY29tbyBub3NzYSBzb2x1w6fDo28gYmFzZWxpbmUgcXVlIHNlcsOhIGNvbXBhcmFkYSBhb3MgbW9kZWxvcyBjYXRlZ29yaXphZG9zIGNpdGFkb3MgYWNpbWEgKFJlZ3Jlc3PDo28gTG9nw61zdGljYSwgw4Fydm9yZSBkZSBEZWNpc8OjbyBlIEtOTikuDQoNCiMjIFJlZ3Jlc3PDo28gTG9nw61zdGljYSANCjwhLS0gUmVncmVzc8OjbyBMb2fDrXN0aWNhIC0tPg0KDQoNCg0KPCEtLSAlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSAtLT4NCiMjIMOBcnZvcmUgZGUgRGVjaXPDo28NCjwhLS0gw4Fydm9yZSBkZSBEZWNpc8OjbyAtLT4NCg0KDQoNCjwhLS0gJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUgLS0+DQojIyBLTk4gKCBfSy1OZWFyZXN0IE5laWdoYm9yc18gKQ0KPCEtLSBrTk4gLS0+DQoNCg0KDQo8IS0tICUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlIC0tPg0KIyBDb21wYXJhbmRvIG1vZGVsb3MNCg0KDQoNCjwhLS0gJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUgLS0+DQojIENvbmNsdXPDo28NCg0KDQoNCg0KVG9kb3MgY8OzZGlnb3MgcmVmZXJlbnRlcyBhbyBwcm9qZXRvIHBvZGVtIHNlciBlbmNvbnRyYWRvcyBbbmVzc2UgcmVwb3NpdMOzcmlvXShodHRwczovL2dpdGh1Yi5jb20vYm1vcmJpbi9NTF9Qcm9qZWN0KS4NCg0KPGhyPg0KDQo8ZGl2IHN0eWxlPSJkaXNwbGF5OiBmbGV4O2p1c3RpZnktY29udGVudDogZmxleC1lbmQ7bWFyZ2luLXRvcDoxMHB4Ij4NCjxhIGlkPSJyZXBvX2ljb24iIGhyZWYgPSJodHRwczovL2dpdGh1Yi5jb20vYm1vcmJpbi9NTF9Qcm9qZWN0IiB0YXJnZXQ9Il9ibGFuayI+DQogIDxzdmcgaGVpZ2h0PSIzMiIgYXJpYS1oaWRkZW49InRydWUiIHZpZXdCb3g9IjAgMCAxNiAxNiIgd2lkdGg9IjMyIiBmaWxsPSIjZDZkNmQ2Ij4NCiAgICA8cGF0aCBkPSJNOCAwQzMuNTggMCAwIDMuNTggMCA4YzAgMy41NCAyLjI5IDYuNTMgNS40NyA3LjU5LjQuMDcuNTUtLjE3LjU1LS4zOCAwLS4xOS0uMDEtLjgyLS4wMS0xLjQ5LTIuMDEuMzctMi41My0uNDktMi42OS0uOTQtLjA5LS4yMy0uNDgtLjk0LS44Mi0xLjEzLS4yOC0uMTUtLjY4LS41Mi0uMDEtLjUzLjYzLS4wMSAxLjA4LjU4IDEuMjMuODIuNzIgMS4yMSAxLjg3Ljg3IDIuMzMuNjYuMDctLjUyLjI4LS44Ny41MS0xLjA3LTEuNzgtLjItMy42NC0uODktMy42NC0zLjk1IDAtLjg3LjMxLTEuNTkuODItMi4xNS0uMDgtLjItLjM2LTEuMDIuMDgtMi4xMiAwIDAgLjY3LS4yMSAyLjIuODIuNjQtLjE4IDEuMzItLjI3IDItLjI3LjY4IDAgMS4zNi4wOSAyIC4yNyAxLjUzLTEuMDQgMi4yLS44MiAyLjItLjgyLjQ0IDEuMS4xNiAxLjkyLjA4IDIuMTIuNTEuNTYuODIgMS4yNy44MiAyLjE1IDAgMy4wNy0xLjg3IDMuNzUtMy42NSAzLjk1LjI5LjI1LjU0LjczLjU0IDEuNDggMCAxLjA3LS4wMSAxLjkzLS4wMSAyLjIgMCAuMjEuMTUuNDYuNTUuMzhBOC4wMTMgOC4wMTMgMCAwMDE2IDhjMC00LjQyLTMuNTgtOC04LTh6Ij48L3BhdGg+DQo8L3N2Zz4NCjwvYT4NCjwvZGl2Pg0KDQo8IS0tID09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09IC0tPg0KPCEtLSBQYXJ0ZSBkYSBmb3JtYXRhw6fDo28gLS0+DQo8c2NyaXB0IHNyYz0ic2V0dGluZ3MvY29kZS5qcyI+PC9zY3JpcHQ+DQo=